Goto

Collaborating Authors

 visitation distribution


AdversarialIntrinsicMotivationforReinforcement Learning

Neural Information Processing Systems

In thispaper,weinvestigatewhether onesuchobjective,theWasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectivelyforreinforcement learning (RL)tasks.


VisualAdversarialImitationLearning usingVariationalModels

Neural Information Processing Systems

Behaviour cloning (BC) is a classic algorithm to imitate expert demonstrations [7], which uses supervised learning to greedily match the expert behaviour at demonstrated expert states. Due to environmentstochasticity,covariateshift,andpolicyapproximationerror,theagentmaydriftaway from the expert state distribution and ultimately fail to mimic the demonstrator [8].





Visual Adversarial Imitation Learning using Variational Models

Neural Information Processing Systems

Reward function specification, which requires considerable human effort and iteration, remains a major impediment for learning behaviors through deep reinforcement learning.


Supplementary Materials A Experiment As suggested by one reviewer, we conduct the following experiment over Cartpole in OpenAI gym to

Neural Information Processing Systems

The following lemma justifies item 3 in Assumption 1. Consider the following two cases: 1. Density function of the policy is smooth, i.e. We then show how Theorem 4 implies Theorem 1. Assumption 3. F or all x X, there exist constants such that the following hold 1. F or all x, we have null A Now we proceed to prove the main theorem. Then, given the above convergence result on the gradient norm, we proceed to prove the convergence of NAC in terms of the function value.



Frictional Q-Learning

Kim, Hyunwoo, Lee, Hyo Kyung

arXiv.org Artificial Intelligence

We draw an analogy between static friction in classical mechanics and extrapolation error in off-policy RL, and use it to formulate a constraint that prevents the policy from drifting toward unsupported actions. In this study, we present Frictional Q-learning, a deep reinforcement learning algorithm for continuous control, which extends batch-constrained reinforcement learning. Our algorithm constrains the agent's action space to encourage behavior similar to that in the replay buffer, while maintaining a distance from the manifold of the orthonormal action space. The constraint preserves the simplicity of batch-constrained, and provides an intuitive physical interpretation of extrapolation error. Empirically, we further demonstrate that our algorithm is robustly trained and achieves competitive performance across standard continuous control benchmarks.